options(repos = c(CRAN = "http://cran.rstudio.com"))
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
if (!requireNamespace("ComplexHeatmap", quietly = TRUE))
install.packages("ComplexHeatmap")
if (!requireNamespace("circlize", quietly = TRUE))
install.packages("magick")
if (!requireNamespace("magick", quietly = TRUE))
install.packages("magick")
if (!requireNamespace("gprofiler2", quietly = TRUE))
install.packages("gprofiler2")
if (!requireNamespace("Rcurl", quietly = TRUE))
install.packages("Rcurl")
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("RCy3")
## Load Required Packages
library(ComplexHeatmap)
library(circlize)
library(knitr)
library(limma)
library(edgeR)
library(ggplot2)
library(magick)
library(gprofiler2)
library(RCurl)
library(RCy3)
Anti-seed PNAs targeting multiple oncomiRs for brain tumor therapy[@wang2023anti],
We established that BNPs loaded with anti-seed sγPNAs targeting multiple oncomiRs are a promising approach to improve the treatment of GBM, with a potential to personalize treatment based on tumor-specific oncomiRs [@wang2023anti].
Our data set was obtained from GEO with the accession id : GSE217366. It was obtained from the article “Anti-seed PNAs targeting multiple oncomiRs for brain tumor therapy” [@wang2023anti]
In part 1 of the project we cleaned, normalized and mapped our dataset to HUGO symbols. We also performed The results of these analysis are summarized as follows : Cleaning: (we got rid of gene duplications and defined new groups for our samples) 21344 of the initial 35741 samples remained after removing gene duplications. Normalization: (We normalized our data by TMM) No outliers were removed after normalization Mapping: (we mapped our samples gene_id that contained the ensembl id to HUGO symbol), A total of 42 genes were not mapped to HUGO symbol. The list of these genes with their expression data were displayed at the end of part 1 of the project. In part 2 we performed DEA and preliminary ORA. 361 genes were upregulated in the treatment group (PNA-10b+21) while 890 genes were downregulated. The most upregulated pathways included pathways associated with cell division and cell cycle. The most downregulated pathways included pathways associated with response to hypoxia. The volcano plot and heatmap summarizing our analysis is presented.
Figure 1. Heatmap of tophits that have a p-value < 0.05. Note the clustering of similar groups together
1. Methods and Gene set. 1) I used GSEA Version 4.3.2 for the analysis. The following geneset was used : “Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol.gmt”. the geneset was downloaded manually from Badar Lab (http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/). [@subramanian2005gene], [@mootha2003pgc].
2. Summary of enrichment results. I used the following parameters for the analysis : number of permutations: 1000, No_collapse, Min size = 15, Max size =500
Figure 2. GSEA Report Summary
Top 5 upregulated genesets are represented below
Figure 2. Upregulated PathwaysFigure 3. Upregulated pathways after NON-thresholded analysis using GSEA
Figure 4. Downregulated pathways after NON-thresholded analysis using GSEA
Figure 5. Top 5 Upregulated pathways after Thresholded analysis using gprofiler
Figure 6. Top 5 Downregulated pathways after Thresholded analysis using gprofiler
As you can see, pathways associated with hypoxia are assigned as downregulated in both of the analysis and pathways associated with mitosis are assigned as upregulated in both of the analysis. The comparison between the two analysis methods is not straightforward. This is becuase the input is totally different in the analysis we performed. We set p-value as 0.01 in part 2 while using GSEA the p-value was set to 0.05 by default. The databases in GSEA and gprofiler is also different leading to differnt results.
In order to create the netwrok, Cytoscape v3.9.1 was used [@shannon2003cytoscape]. Parameters were set as the following : FDR q-value : 0.01, Filter genes by expression selected, similarity metric: Jaccard(50%), Overlap(50%) combined (0.375). Edge cutoff was set to : 0.375
1. Enrichment Map
There were 556 nodes (corresponding to genesets) and 7058 (corresponding to genes) in the resulting map. Thresholds used were noted earlier.Figure 7. Created Network using Cytoscape. Note that upregulated genes and dowregulated genes are clustered together
2. Annotation. The following parameters were used : Cluster Alogorithm : MCL Cluster Edge Weight Column : None Lable Column : GSDESCR Max Word per lable: 3 Min Word occurance : 1 Adjacent Word Bonus : 8
Figure 8. Annotated Network using AutoAnnotate. Clustered genes are represented in yellow circles. The annotation font size is proportional to the cluster size
3. Figure